MonolayerCultures / src / Oligodendroglia / [RC17]Slingshot for Oligodendroglia.Rmd
[RC17]Slingshot for Oligodendroglia.Rmd
Raw
---
title: '[RC17]Slingshot for Oligodendroglia'
author: "Nina-Lydia Kazakou"
date: "17/03/2022"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


Proposed in (Street et al. 2017) 

The goal of slingshot is to use clusters of cells to uncover global structure and convert this structure into smooth lineages represented by one-dimensional variables, called pseudotime.

Slingshot is a two-step process composed of identifying the global lineage structure with a cluster-based minimum spanning tree (MST) and fitting simultaneous principal curves to describe each lineage.

These two steps can be run separately with the getLineages and getCurves functions, or together with the wrapper function, slingshot (recommended). We will use the wrapper function for the analysis of the single-trajectory dataset, but demonstrate the usage of the individual functions later, on the bifurcating dataset.

The slingshot wrapper function performs both steps of trajectory inference in a single call. The necessary inputs are a reduced dimensional matrix of coordinates and a set of cluster labels. These can be separate objects or, in the case of the single-trajectory data, elements contained in a SingleCellExperiment object.

# Set-up

### Load libraries
```{r message=FALSE, warning=TRUE}
library(slingshot)
library(scales)
library(Seurat)
library(ggplot2)
library(RColorBrewer)
library(ggsci)
library(here)
library(tidymodels)
```

### Colour Palette

```{r load_palette}
mypal <- pal_npg("nrc", alpha = 0.7)(10)
mypal2 <-pal_tron("legacy", alpha = 0.7)(7)
mypal3<-pal_lancet("lanonc", alpha = 0.7)(9)
mypal4<-pal_simpsons(palette = c("springfield"), alpha = 0.7)(16)
mypal5 <- pal_rickandmorty(palette = c("schwifty"), alpha = 0.7)(6)
mypal6 <- pal_futurama(palette = c("planetexpress"), alpha = 0.7)(5)
mypal7 <- pal_startrek(palette = c("uniform"), alpha = 0.7)(5)
mycoloursP<- c(mypal, mypal2, mypal3, mypal4)
```


### Load Object 
```{r}
oligos <- readRDS(here("data", "Processed", "Oligodendroglia", "hES-derived_monolayerOL.rds"))
Idents(oligos) <- "ClusterID"
```

```{r}
DimPlot(oligos, group.by = "ClusterID", pt.size = 0.9, label = TRUE, cols = mycoloursP[6:40]) & NoLegend()
```

```{r}
sce <- as.SingleCellExperiment(oligos)
```


# Slingshot
Running slingshot is a two-step process composed of identifying the global lineage structure with a cluster-based minimum spanning tree (MST) and fitting simultaneous principal curves to describe each lineage.

```{r}
dir.create(here("outs", "Processed", "Oligodendroglia", "Pseudo-time_Analysis", "Slingshot"))
```

```{r}
# Use wrapper function 
sce <- slingshot(sce, clusterLabels = 'ClusterID', reducedDim = 'UMAP')
```

The output is an sce with slingshot results in the colData. All of the results are stored in a PseudotimeOrdering object, which is added to the colData of the original object and be accessed with colData(sce)$slingshot. 
All the inferred pseudotime variables (one per lineage) are added to the colData, individually. 

To extract all slingshot results in a single object, we will use SlingshotDataSet, objects that are primarily used for visualization, as several plotting methods are included with the package. 
```{r}
summary(sce$slingPseudotime_1)
```

```{r fig.height=6, fig.width=8}
colors <- colorRampPalette(brewer.pal(11,'Spectral')[-6])(100)

plotcol <- colors[cut(sce$slingPseudotime_1, breaks=100)]

plot(reducedDims(sce)$UMAP, col = plotcol, pch=16, asp = 1)
lines(SlingshotDataSet(sce), lwd=2, col='black')
```

We can also see how the lineage structure was initially estimated by the cluster-based minimum spanning tree:
```{r fig.height=6, fig.width=8}
plot(reducedDims(sce)$UMAP, col = mycoloursP[sce$ClusterID], pch=16, asp = 1)
lines(SlingshotDataSet(sce), lwd=2, type = 'lineages', col = 'black')
```


Slingshot also allows for the specification of known endpoints.
Clusters which are specified as terminal cell states will be constrained to have only one connection when the MST is constructed (i.e., they must be leaf nodes). This constraint could potentially impact how other parts of the tree are drawn.

```{r}
sds <- slingshot(Embeddings(oligos, "umap"), clusterLabels = oligos@meta.data$ClusterID, start.clus = "pri-OPC") # set pri-OPC as the starting cluster for the pseudotime

cl <- oligos@meta.data$ClusterID
cols <- mycoloursP

plot(slingReducedDim(sds), col = cols[cl], pch = 16, cex = 0.5) 
lines(SlingshotDataSet(sds), lwd = 2, type = 'lineages', col = 'black') 
```

We can see that the trajectory is not alter in response to setting a specific cluster as a starting point. I have also tried to use different clusters as a starting point for the trajectory, but every time I get the same output. 

## Differential gene expression along curves
### Get top highly variable genes
```{r}
top_hvg <- HVFInfo(oligos) %>% 
  mutate(., bc = rownames(.)) %>% 
  arrange(desc(variance.standardized)) %>% 
  top_n(300, variance.standardized) %>% 
  pull(bc)
```

### Prepare data for random forest
```{r}
dat_use <- t(GetAssayData(oligos, slot = "data")[top_hvg,])
dat_use_df <- cbind(slingPseudotime(sds)[,1], dat_use) # Do curve 1, so 1st columnn
colnames(dat_use_df)[1] <- "pseudotime"
dat_use_df <- as.data.frame(dat_use_df[!is.na(dat_use_df[,1]),])
```

### use tidymodels
#### generate training and validation dataset
```{r}
set.seed(123)
dat_split <- initial_split(dat_use_df)
dat_train <- training(dat_split)
dat_val <- testing(dat_split)
```

#### train model
```{r}
model <- rand_forest(mtry = 200, 
                     trees = 1400, 
                     min_n = 15, 
                     mode = "regression") %>%
  set_engine("ranger", 
             importance = "impurity", 
             num.threads = 3) %>%
  fit(pseudotime ~ ., data = dat_train)
```

#### Evaluate model with validation set
```{r}
val_results <- dat_val %>% 
  mutate(estimate = predict(model, .[,-1]) %>% pull()) %>% 
  dplyr::select(truth = pseudotime, estimate)
```


###### Identify top 20 genes
```{r}
var_imp <- sort(model$fit$variable.importance, decreasing = TRUE)
top_genes <- names(var_imp)[1:40]
```


###### Plot top 20 genes
```{r, fig.height = 10, fig.width = 10}
colors_a <- colorRampPalette(brewer.pal(11,'Spectral')[-6])(100)

par(mfrow = c(3, 2))
for (i in seq_along(top_genes)) {
  colors <- colors_a[cut(dat_use[,top_genes[i]], breaks = 100)]
  plot(slingReducedDim(sds), col = colors, 
       pch = 16, cex = 0.5, main = top_genes[i])
  lines(SlingshotDataSet(sds), lwd = 2, col = 'black', type = 'lineages')
}
```

```{r eval=FALSE}
saveRDS(sce, here("outs", "Processed", "Oligodendroglia", "Pseudo-time_Analysis", "Slingshot", "slingshot.sce.rds"))
```

I will also try Slingshot on different time-points, since I've noticed in the past that the trajectory is altered between oligodendroglia that has been cultured for one week and oligodendroglia that has been cultured for longer. 

```{r}
Idents(oligos) <- "Sample"
wk_1 <- subset(oligos, ident = "wk1")
wk_later <- subset(oligos, ident = c("wk2", "wk3"))
```

```{r}
wk_1_sce <- as.SingleCellExperiment(wk_1)
wk_later_sce <- as.SingleCellExperiment(wk_later)
```

```{r}
wk_1_sce <- slingshot(wk_1_sce, clusterLabels = 'ClusterID', reducedDim = 'UMAP')

colors <- colorRampPalette(brewer.pal(11,'Spectral')[-6])(100)

plotcol <- colors[cut(wk_1_sce$slingPseudotime_1, breaks=100)]

plot(reducedDims(wk_1_sce)$UMAP, col = mycoloursP[wk_1_sce$ClusterID], pch=16, asp = 1)
lines(SlingshotDataSet(wk_1_sce), lwd=2, type = 'lineages', col = 'black')
```
```{r}
sds_wk1 <- slingshot(Embeddings(wk_1, "umap"), clusterLabels = wk_1@meta.data$ClusterID, start.clus = "pri-OPC") # set pri-OPC as the starting cluster for the pseudotime

cl <- wk_1@meta.data$ClusterID
cols <- mycoloursP

plot(slingReducedDim(sds_wk1), col = cols[cl], pch = 16, cex = 0.5) 
lines(SlingshotDataSet(sds_wk1), lwd = 2, type = 'lineages', col = 'black') 
```

```{r}
wk_later_sce <- slingshot(wk_later_sce, clusterLabels = 'ClusterID', reducedDim = 'UMAP')

colors <- colorRampPalette(brewer.pal(11,'Spectral')[-6])(100)

plotcol <- colors[cut(wk_later_sce$slingPseudotime_1, breaks=100)]

plot(reducedDims(wk_later_sce)$UMAP, col = mycoloursP[wk_later_sce$ClusterID], pch=16, asp = 1)
lines(SlingshotDataSet(wk_later_sce), lwd=2, type = 'lineages', col = 'black')
```

```{r}
sds_later <- slingshot(Embeddings(wk_later, "umap"), clusterLabels = wk_later@meta.data$ClusterID, start.clus = "pri-OPC") # set pri-OPC as the starting cluster for the pseudotime

cl <- wk_later@meta.data$ClusterID
cols <- mycoloursP

plot(slingReducedDim(sds_later), col = cols[cl], pch = 16, cex = 0.5) 
lines(SlingshotDataSet(sds_later), lwd = 2, type = 'lineages', col = 'black') 
```


```{r}
sessionInfo()
```


The Slingshot package has recently been updated and changed some of its functions. Here, I also attach the Slingshot script from the original analysis done on 05/04/2021. The trajectory output is not different between the original analysis and this one. 

```{r eval=FALSE}
Idents(oligos) <- "RNA_snn_res.0.7"
line4 <- subset(oligos, ident = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11))
```

### Slingshot does not support ggplot2. Workaround:
```{r eval=FALSE}
cell_pal <- function(cell_vars, pal_fun,...) {
  if (is.numeric(cell_vars)) {
    pal <- pal_fun(100, ...)
    return(pal[cut(cell_vars, breaks = 100)])
  } else {
    categories <- sort(unique(cell_vars))
    pal <- setNames(pal_fun(length(categories), ...), categories)
    return(pal[cell_vars])
  }
}
```

### Run Slingshot
```{r eval=FALSE}
sds_line4 <- slingshot(Embeddings(line4, "umap"), clusterLabels = line4$RNA_snn_res.0.7, start.clus = 1, stretch = 0) #give it Pri-OPCs as starting point
cell_colors_clust <- cell_pal(line4$RNA_snn_res.0.7, hue_pal())
plot(reducedDim(sds_line4), col = cell_colors_clust, pch = 16, cex = 0.5)
lines(sds_line4, lwd = 2, type = 'lineages', col = 'black')
```

### Convert into sce.object
```{r eval=FALSE}
line4_sce <- as.SingleCellExperiment(line4)
```

```{r eval=FALSE}
line4_sce <- slingshot(line4_sce, clusterLabels = 'RNA_snn_res.0.7', 
                     reducedDim = 'UMAP',
                     start.clus = 1, stretch = 0)
summary(line4_sce$slingPseudotime_1)

colors <- colorRampPalette(brewer.pal(11,'Spectral')[-6])(100)
plotcol <- colors[cut(line4_sce$slingPseudotime_1, breaks=100)]
plot(reducedDims(line4_sce)$UMAP, col = plotcol, pch=16, asp = 1)
lines(SlingshotDataSet(line4_sce), lwd=2, col='black')
```

### Run Slingshot on PCA
```{r eval=FALSE}
sds_pca <- slingshot(Embeddings(line4, "pca"), clusterLabels = line4$RNA_snn_res.0.7, start.clus = 1, stretch = 0)

cell_colors_clust <- cell_pal(line4$RNA_snn_res.0.7, hue_pal())
plot(reducedDim(sds_pca), col = cell_colors_clust, pch = 16, cex = 0.5)
lines(sds_pca, lwd = 2, type = 'lineages', col = 'black')

DimPlot(line4, reduction = "pca", group.by = "CLUSTERS", label  = T)
```